CONFIRM - Clustering Of Noisy Form Images using Robust Metrics

نویسندگان

  • Chris Tensmeyer
  • Tony Martinez
چکیده

The ability to automatically cluster large collections of noisy form images according to form type would improve the efficiency of organizations that currently do this by hand. Some noisy form collections contain form types that are structurally very similar, but should cluster apart. To address this issue, we propose CONFIRM Clustering Of Noisy Form Images using Robust Metrics. CONFIRM uses a novel technique to match form text and rule lines to create vector representations of each form. A Random Forest classifier is then used to learn a pairwise similarity metric for use in Spectral Clustering. Validation is provided on the NIST tax forms as well as several historical forms datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Definition of General Operator Space and The s-gap Metric for Measuring Robust Stability of Control Systems with Nonlinear Dynamics

In the recent decades, metrics have been introduced as mathematical tools to determine the robust stability of the closed loop control systems. However, the metrics drawback is their limited applications in the closed loop control systems with nonlinear dynamics. As a solution in the literature, applying the metric theories to the linearized models is suggested. In this paper, we show that usin...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

A robust segmentation approach for noisy medical images using fuzzy clustering with spatial probability

Image segmentation plays a major role in medical imaging applications. During last decades, developing robust and efficient algorithms for medical image segmentation has been a demanding area of growing research interest. The renowned unsupervised clustering method, Fuzzy C-Means (FCM) algorithm is extensively used in medical image segmentation. Despite its pervasive use, conventional FCM is hi...

متن کامل

A Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images

Convolutional neural network is one of the effective methods for classifying images that performs learning using convolutional, pooling and fully-connected layers. All kinds of noise disrupt the operation of this network. Noise images reduce classification accuracy and increase convolutional neural network training time. Noise is an unwanted signal that destroys the original signal. Noise chang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015